Goto

Collaborating Authors

 tail probability


Sharp Concentration Inequalities: Phase Transition and Mixing of Orlicz Tails with Variance

arXiv.org Machine Learning

In this work, we investigate how to develop sharp concentration inequalities for sub-Weibull random variables, including sub-Gaussian and sub-exponential distributions. Although the random variables may not be sub-Guassian, the tail probability around the origin behaves as if they were sub-Gaussian, and the tail probability decays align with the Orlicz $ฮจ_ฮฑ$-tail elsewhere. Specifically, for independent and identically distributed (i.i.d.) $\{X_i\}_{i=1}^n$ with finite Orlicz norm $\|X\|_{ฮจ_ฮฑ}$, our theory unveils that there is an interesting phase transition at $ฮฑ= 2$ in that $\PPล‚(ล‚|\sum_{i=1}^n X_i \r| \geq t\r)$ with $t > 0$ is upper bounded by $2\expล‚(-C\maxล‚\{\frac{t^2}{n\|X\|_{ฮจ_ฮฑ}^2},\frac{t^ฮฑ}{ n^{ฮฑ-1} \|X\|_{ฮจ_ฮฑ}^ฮฑ}\r\}\r)$ for $ฮฑ\geq 2$, and by $2\expล‚(-C\minล‚\{\frac{t^2}{n\|X\|_{ฮจ_ฮฑ}^2},\frac{t^ฮฑ}{ n^{ฮฑ-1} \|X\|_{ฮจ_ฮฑ}^ฮฑ}\r\}\r)$ for $1\leq ฮฑ\leq 2$ with some positive constant $C$. In many scenarios, it is often necessary to distinguish the standard deviation from the Orlicz norm when the latter can exceed the former greatly. To accommodate this, we build a new theoretical analysis framework, and our sharp, flexible concentration inequalities involve the variance and a mixing of Orlicz $ฮจ_ฮฑ$-tails through the min and max functions. Our theory yields new, improved concentration inequalities even for the cases of sub-Gaussian and sub-exponential distributions with $ฮฑ= 2$ and $1$, respectively. We further demonstrate our theory on martingales, random vectors, random matrices, and covariance matrix estimation. These sharp concentration inequalities can empower more precise non-asymptotic analyses across different statistical and machine learning applications.





M-Statistic for Kernel Change-Point Detection

Neural Information Processing Systems

Detecting the emergence of an abrupt change-point is a classic problem in statistics and machine learning. Kernel-based nonparametric statistics have been proposed for this task which make fewer assumptions on the distributions than traditional parametric approach. However, none of the existing kernel statistics has provided a computationally efficient way to characterize the extremal behavior of the statistic. Such characterization is crucial for setting the detection threshold, to control the significance level in the offline case as well as the average run length in the online case. In this paper we propose two related computationally efficient M -statistics for kernel-based change-point detection when the amount of background data is large. A novel theoretical result of the paper is the characterization of the tail probability of these statistics using a new technique based on change-of-measure. Such characterization provides us accurate detection thresholds for both offline and online cases in computationally efficient manner, without the need to resort to the more expensive simulations such as bootstrapping. We show that our methods perform well in both synthetic and real world data.




A Proofs of Theoretical Results

Neural Information Processing Systems

Lemma 1. F or any embedding f and finite N, we have L Theorem 3. F or any embedding f and finite N and M, we have e L By Jensen's inequality, we may push the absolute value inside the expectation to see that The outer expectation disappears since the tail probably bound of Theorem A.2 holds uniformly for all fixed x, x We still owe the reader a proof of Lemma A.2, which we give now. We then proceed to bound the right hand tail probability. Combining Lemma A.3 and Lemma A.4, with probability at least 1, for all f 2F, we have L Note the definition of g is slightly modified in this context. We again use the Adam optimizer with learning rate 0 . To implement the debiased objective, we only modify the "src/s2v-model.py"